It appears that by 2021, the term “coronavirus” has become a fairly uncommon way to refer to the virus, with COVID becoming the reference of choice. It also seems that over the timeframe, messaging about staying home, social distancing, and Trump has died down and become more undifferentiated from the many other COVID-19-related topics being discussed.
30 Mar 2020 Mean Tweet Length: 148.77
30 Mar 2021 Mean Tweet Length: 164.19
We check whether there is a difference in tweet lengths between the two samples.
Tweets from March 2020 were on average shorter than those from 2021. It seems that there is a greater share of longer tweets (~300 characters) in 2021.
The difference in mean tweet length is statistically significant, as p-value obtained for the one-sided unpaired two sample t-test is much less than 0.01. The difference is substantial as well—about sixteen fewer characters.
We joined the NRC Word-Emotion Association Lexicon to our data, which alloweds us to identify words associated with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).
We produce visualizations comparing the sentiments being expressed in each sample period.
Compared to our 2020 tweets, the 2021 tweets express less trust, less surprise, more sadness, more positivity, less joy, more fear, less disgust, less anticipation, and less anger.
---
I will update this.
Next, we join the AFINN sentiment lexicon, a list of English terms manually rated for valence with an integer between -5 (negative) and +5 (positive) by Finn Årup Nielsen between 2009 and 2011. We use this lexicon to compute mean positivity scores for all words tweeted in each sample year.
The tweets from 2021 are slightly more positive, but the difference appears negligible.
| topic1 | topic2 | topic3 | topic4 | topic5 | topic6 | topic7 | topic8 | topic9 | topic10 |
|---|---|---|---|---|---|---|---|---|---|
| covid | covid | pandem | corona | lockdown | stay | coronavirus | china | deliv | #coronavirus |
| death | pandem | trump | virus | @drtedro | home | covid | virus | #covid19 | #covid |
| coronavirus | coronavirus | coronavirus | lockdown | world | social | health | trump | support | #covid19 |
| test | call | @realdonaldtrump | shit | god | distanc | mask | peopl | offici | read |
| week | time | presid | time | @who | peopl | hospit | spread | act | #stayhom |
| die | nation | peopl | day | coronavirus | safe | care | world | ll | quarantin |
| infect | due | live | hope | follow | friend | fight | chines | sign | day |
| report | busi | news | famili | bad | month | worker | countri | senat | post |
| york | respons | media | don | india | fuck | patient | @realdonaldtrump | copi | #socialdistanc |
| updat | pay | american | fuck | @ladygaga | april | protect | blame | repres | watch |
| posit | countri | die | peopl | pandem | don | medic | stop | @govkemp | video |
| rate | question | real | covid | student | day | doctor | govern | #gapol | share |
| confirm | crisi | global | wait | thousand | live | take | lie | @rondesantisfl | #quarantinelif |
| dr | person | watch | love | lot | time | save | call | #sayfi | stori |
| total | govern | respons | die | game | week | ventil | travel | #flapol | time |
| million | school | #coronavirus | gonna | feel | hous | crisi | hoax | parent | check |
| gt | issu | listen | start | readi | hand | nurs | america | beach | link |
| counti | continu | stop | kill | univers | extend | donat | start | middl | sunday |
| hospit | amid | brief | ve | covid19 | practic | healthcar | ban | @scgovernorpress | book |
| flu | move | press | job | usa | sick | line | suppli | #scpolit | play |
| Var1 | Freq |
|---|---|
| topic1 | 452 |
| topic2 | 434 |
| topic3 | 565 |
| topic4 | 581 |
| topic5 | 377 |
| topic6 | 666 |
| topic7 | 419 |
| topic8 | 473 |
| topic9 | 230 |
| topic10 | 397 |
Now, we use the quanteda package’s implementation of topic modeling to identify what themes/discussions are prevalent in each year. Underlying this topic modeling implementation is Latent Dirichlet allocation (LDA), a machine learning algorithm that learns clusters of words that tend to occur together (topics). Tweets, therefore, are understood as heterogeneous mixtures of these topics. For each tweet, probabilities are assigned for each topic that the document may or may not include, and we will assume that the topic assigned the highest probability by the algorithm is the focus of the tweet.
While these topics may initially seem to make little sense, there are some patterns we can pick out.
Topic 1 seems to be the informational topic concerning outbreaks, data, hospitalizations, deaths, etc.
Topic 2, it would seem, is focused on the crisis’ impact on the nation: businesses, governments, schools, and people.
Topic 3 appears to be focused on Trump. More specifically, it seems to be about media-related topics such as press briefings, live news, etc. Topic 3 was among the most frequently discussed topics, as a lot of attention was focused on the president.
Topic 4 seems to encompass emotionally intense tweets reflecting fear, anger, and hope. It includes multiple curse words along with words like “love,” “hope,” “kill,” and “die.”
Topic 5 is puzzling; there is no apparent connection between the WHO, Dr. Tedros, and Lady Gaga. We eventually found out that these words correspond to a topic that was trending on 30 Mar 2020: a phone call between WHO Director Dr. Tedros and Lady Gaga. See here: https://twitter.com/drtedros/status/1244008665251708929?lang=en
Topic 6 encompasses tweets urging social distancing.
Topic 7 is the most distinct, as it clearly focuses on the health care situation: mask and ventilator shortages, risks posed to doctors and nurses, and inadequate testing.
Topic 8 centers on China. If we piece together the words, it seems that some of the tweets likely discuss whether it is apt to blame China (note that one of the keywords is “stop”). Additional terms include “travel,” “ban,” and “hoax,” and “lie,” which altogether imply that conversations centered on China are interwoven with virus skepticism.
Topic 9 is clearly a political topic; it seems to be discussing national- and state-level policies. For instance, it includes Governor Kemp of Georgia (GA) and the hashtag #gapol, Governor Ron DeSantis of Florida (FL) and the hashtag #flapol, the South Carolina governor press and SC-related hashtags, and, at a national level, the Senate and representatives.
Topic 10 seems to focus on things people are doing at home while quarantining—watching sports in particular: “#stayhom,” “#quarantinelif,” “read,” “play,” “watch,” “game,” and “day.” This was the second most prevalent topic. It seems that many people were tweeting about their day-to-day experience in quarantine.
| topic1 | topic2 | topic3 | topic4 | topic5 | topic6 | topic7 | topic8 | topic9 | topic10 |
|---|---|---|---|---|---|---|---|---|---|
| #covid19 | home | covid | pandem | vaccin | pandem | covid | mask | covid | pandem |
| virus | covid | lockdown | post | covid | trump | cdc | peopl | death | covid |
| #covid | stay | time | link | week | respons | biden | wear | test | peopl |
| #coronavirus | school | watch | #covid19 | shot | covid | american | social | peopl | fuck |
| dr | time | support | learn | effect | fauci | news | pandem | virus | feel |
| concern | live | month | servic | dose | bad | doom | distanc | day | love |
| base | kid | day | share | risk | tweet | border | care | report | shit |
| plan | children | due | check | million | situat | impend | don | posit | don |
| @potus | safe | play | citi | receiv | birx | director | stop | hospit | talk |
| evict | break | busi | music | immun | access | warn | continu | rate | ve |
| mar | famili | close | promo | prevent | origin | die | health | die | time |
| #vaccin | student | restrict | world | read | call | presid | covid | china | guy |
| #ccpvirus | due | brisban | websit | coronavirus | amount | thousand | control | flu | take |
| alarm | nurs | game | market | studi | histori | passport | public | lab | god |
| tax | week | quarantin | leader | health | reach | trump | complet | patient | hope |
| transmiss | travel | start | live | elig | blame | travel | scienc | coronavirus | happen |
| peter | line | local | read | uk | rt | america | life | increas | make |
| fund | women | ago | promot | appoint | dr | surg | busi | updat | day |
| navarro | job | season | recent | pfizer | pm | top | data | wuhan | start |
| extend | person | lock | sign | develop | great | illeg | rule | counti | real |
| Var1 | Freq |
|---|---|
| topic1 | 323 |
| topic2 | 521 |
| topic3 | 508 |
| topic4 | 398 |
| topic5 | 670 |
| topic6 | 358 |
| topic7 | 555 |
| topic8 | 595 |
| topic9 | 654 |
| topic10 | 652 |
The topics from 2021 are harder to interpret.
Topic 1 mentions the Peter Navarro scandal, wherein he—a Trump advisor—allegedly personally profited from questionable/corrupt COVID-19 vaccine investment decisions. None of the other words capture a distinct theme; there are also mentions of taxes, the POTUS, China blaming, etc.
Topic 2 discusses the handling of children and families with respect to schools, travel, and jobs.
Topic 3 seems to be talking about a sports game featuring Brisbane, which people were presumably tuning into. This inference is based on the following words: “watch,” “play,” “brisban,” “game,” and “season.”
We cannot make sense of topic 4.
Topic 5 focuses on the vaccine rollout.
Topic 6 seems focused on Deborah Birx’s comments right around that time, when she claimed that most COVID-19 deaths could have been prevented by Trump and Fauci.
Topic 7 concerns the border crisis—the surge of illegal migrants coming to the US from Mexico, which possibly raises public health fears.
Topic 8 encourages mask wearing and social distancing.
Topic 9 seems informational—focused partly on the COVID situation in China, though the mention of “lab” makes me think that there are conspiracy theories captured by this topic.
Topic 10 is the emotional topic, with angry curse words and words like “love,” “feel,” “hope,” and “God.”
---
We examine this question for both March of 2020 and 2021. We use the newsmap model as described on the quanteda package website: https://tutorials.quanteda.io/machine-learning/newsmap/
After formatting the data into country-level document feature matrices, we show the estimated number of mentions for each country.
It appears that for both datasets, the most mentions concern the United States. However, in 2020, a greater share of attention is centered on China than on the next two most mentioned locations (Britain and Canada).
We can visualize this using geographic heatmaps.
As you can see, China is brighter in the 2020 map (as is India and Australia to an extent), which indicates a higher frequency of mentions.
In 2021, the English-speaking Twitter user-base seems to be slightly more focused on its home countries than on China, though China still receives substantial attention.
| coef(newsmap20)$US | |
|---|---|
| americans | 6.637461 |
| american | 6.364167 |
| wtf | 2.115672 |
| propaganda | 1.997889 |
| dangerous | 1.997889 |
| scientists | 1.710207 |
| 1.710207 | |
| relief | 1.710207 |
| irresponsible | 1.710207 |
| citizens | 1.619235 |
| coef(newsmap20)$CN | |
|---|---|
| china | 7.787976 |
| chinese | 6.038776 |
| tons | 4.824332 |
| communist | 3.083866 |
| wing | 2.978505 |
| february | 2.909513 |
| january | 2.573040 |
| blocked | 2.467680 |
| racist | 2.365401 |
| supplies | 2.321726 |
| coef(newsmap20)$GB | |
|---|---|
| uk | 6.646022 |
| ventilator | 2.200434 |
| son | 1.912752 |
| street | 1.602597 |
| write | 1.602597 |
| building | 1.602597 |
| usa | 1.563377 |
| police | 1.538059 |
| note | 1.507287 |
| economic | 1.507287 |
| coef(newsmap20)$CA | |
|---|---|
| canada | 6.455853 |
| recovered | 1.770640 |
| buy | 1.696532 |
| security | 1.696532 |
| mo | 1.627539 |
| difficult | 1.627539 |
| wall | 1.627539 |
| leading | 1.627539 |
| write | 1.627539 |
| spend | 1.627539 |
| coef(newsmap21)$US | |
|---|---|
| americans | 6.946976 |
| american | 6.253829 |
| washington | 5.427150 |
| bless | 3.124565 |
| saved | 2.921624 |
| facilities | 2.094946 |
| borders | 1.900790 |
| reopen | 1.776492 |
| died | 1.689481 |
| elected | 1.689481 |
| coef(newsmap21)$CN | |
|---|---|
| china | 7.180347 |
| chinese | 6.299148 |
| origins | 3.519638 |
| theory | 3.268324 |
| obama | 2.931852 |
| animal | 2.421026 |
| lab | 2.325716 |
| believes | 2.238704 |
| wuhan | 2.174166 |
| investigation | 2.120922 |
| coef(newsmap21)$GB | |
|---|---|
| uk | 6.845900 |
| eu | 3.359545 |
| grow | 2.771758 |
| worry | 2.771758 |
| strain | 2.638227 |
| az | 2.297300 |
| delay | 1.904257 |
| ireland | 1.904257 |
| simple | 1.904257 |
| struggling | 1.808947 |
| coef(newsmap21)$CA | |
|---|---|
| canada | 6.664331 |
| astrazeneca | 2.341145 |
| hotel | 2.069211 |
| simple | 1.963851 |
| book | 1.868541 |
| national | 1.609679 |
| ford | 1.558386 |
| ridiculous | 1.558386 |
| increasing | 1.558386 |
| doubt | 1.558386 |
For each of the two sample periods, we would like to look at what words are associated with each country. We specifically look at the four most mentioned countries in each dataset: the US, China, Great Britain, and Canada.
2020 Interpretation
It is difficult to make sense of these words, but there are a few whose meanings are obvious. There seems to be a lot of focus on expertism in the US with words like “scientists,” “propaganda,” and “expert.” There’s also discussion of relief bills and certain people being irresponsible.
The China conversation centers around communism, racism, and international travel—to Europe and India.
The Britain-/Canada-related words don’t show any obvious themes, but it seems that the ventilator shortage in the UK was one salient topic.
2021 Interpretation
It is clear that the conversations surrounding these countries has changed in the past year. Rochelle Walensky, the new CDC Director is one term that sticks out. Others are the discussion of the country reopening, energy policy, and borders. We see that all of these terms are more specific than the general focus on scientists and experts that we saw in 2020. Perhaps our national fog is clearing as our country disseminates the vaccine and progress is made.
The China discussion still centers on theories about the origin of COVID-19, with mentions of a lab, bats, and Wuhan. Racism is still a common topic, especially given the recent hate crimes in the US.
THe UK mentions occur in the contexts of relations with the EU, worries about a new strain, and the Johnson & Johnson vaccine. Concerning Canada, we see talks of the AstraZeneca vaccine, which recently rolled out there. The other words listed are less easy to interpret.
2020
Obama was the most followed person getting retweeted on that day, and it seems that Katy Perry was second. India PM Narendra Modi also was getting retweeted at the time, along with several nwes outlets, politicians, and celebrities.
2021
In 2021, it seems that Obama was far and away the most followed person getting retweeted with nobody else coming near. News outlets encompass a greater share of the top 20 in follows—perhaps because there are less celebrities talking about COVID-19 in March of 2021 than in 2020.
| ScreenName | rt_total |
|---|---|
| moreki_mo | 369869 |
| SethAbramson | 236566 |
| ChicagoTraderrr | 204214 |
| TechInsider | 184160 |
| siravariety | 176882 |
| a_new_hopee | 144919 |
| SinghLions | 141862 |
| _caitlingeorgia | 133052 |
| JoeBiden | 128372 |
| MLKChefLean | 126289 |
| dglo4me | 118316 |
| CorneliaLG | 114712 |
| BarackObama | 111313 |
| comiketofficial | 110311 |
| sin_xia | 101894 |
| FaveEngineerJen | 101459 |
| b0mbchell_ | 93733 |
| emmabethgall | 90161 |
| jeremycyoung | 89643 |
| quenblackwell | 85545 |
| ScreenName | rt_total |
|---|---|
| JRKSB_ | 140677 |
| Marco_Acortes | 91265 |
| Mippcivzla | 60549 |
| JoeBiden | 59563 |
| BIGHIT_MUSIC | 47090 |
| NicolasMaduro | 40822 |
| aj_buu | 39485 |
| KatPapaJohns | 38968 |
| leelecarvalho_ | 38215 |
| Mikel_Jollett | 37101 |
| wwxwashere | 35973 |
| 843KT | 35627 |
| __Jones__ | 32319 |
| Ric3townFinest | 31910 |
| VTVcanal8 | 31656 |
| mordomoeugenio | 28083 |
| BarackObama | 25413 |
| tattyhassan | 24946 |
| Mediavenir | 22708 |
| DanPriceSeattle | 21799 |
For both lists, most of these names are not particularly recognizable with the exception of Joe Biden, Barack Obama, and Nicolas Maduro (2021).
| ScreenName | Followers | Retweets | n | rt_index |
|---|---|---|---|---|
| ElNacionalWeb | 5147877 | 6 | 3 | 4.0e-07 |
| ndtv | 15088045 | 71 | 4 | 1.2e-06 |
| Reuters | 23385704 | 200 | 5 | 1.7e-06 |
| guardian | 9702293 | 54 | 3 | 1.9e-06 |
| la_patilla | 7067042 | 83 | 5 | 2.3e-06 |
| radiomitre | 1054776 | 6 | 2 | 2.8e-06 |
| lasopa_news | 447945 | 4 | 3 | 3.0e-06 |
| kompascom | 8139885 | 146 | 6 | 3.0e-06 |
| CGTNOfficial | 13614960 | 142 | 3 | 3.5e-06 |
| elnorte | 1017073 | 8 | 2 | 3.9e-06 |
| ScreenName | Followers | Retweets | n | rt_index |
|---|---|---|---|---|
| detikcom | 16866238 | 6 | 9 | 0e+00 |
| TheEconomist | 25665215 | 9 | 4 | 1e-07 |
| FinancialTimes | 6956127 | 4 | 5 | 1e-07 |
| SSalud_mx | 1239631 | 3 | 8 | 3e-07 |
| latimes | 3813280 | 11 | 8 | 4e-07 |
| Independent | 3557651 | 7 | 3 | 7e-07 |
| eleconomista | 721345 | 1 | 2 | 7e-07 |
| HoustonChron | 648473 | 1 | 2 | 8e-07 |
| CGTNOfficial | 13614961 | 32 | 3 | 8e-07 |
| DolarToday | 3749456 | 20 | 6 | 9e-07 |
2020
ElNacionalWeb, NDTV, Reuters, and the Guardian are all terrible at getting retweeted, which makes sense because they likely tweet a lot of boring, matter-of-fact news as opposed to the clickbait-y headlines that sites like Fox or the New York Times tweet.
2021
Again, we see mostly news sites failing to get many retweets. There isn’t anything too interesting to be said about this.
2020
It seems that topic 9 was being retweeted about the most, which is very interesting because topic 9 had the fewest original tweets out of all topics we identified from the 2020 data.
Recall that topic 9 was about politics. Perhaps people tend to promote the views of media outlets or political influencers whose content they consume but don’t have much to personally contribute to these conversations.
It may also be the case that retweeting and publishing original tweets are zero sum behaviors. In other words, when people are more likely to retweet about a particular topic, maybe that makes them less likely to also tweet about it themselves. Maybe others’ have summed up their thoughts better than they can convey.
We will see if this pattern is also seen in the 2021 data.
2021
This hypothesized explanation is supported by the 2021 data; topic 1 is the most retweeted about but is the least originally tweeted about. Recall that topic 1 was a sort of catch-all alarmist topic, though it mentions Peter Navarro. It may be the case that retweets displace tweets about the same topic.
30 Mar 2020 Mean Retweeted Tweet Length: 188.460557146576
30 Mar 2021 Mean Retweeted Tweet Length: 195.017683772538
We see that 2020 retweets are slightly shorter and conduct a t-test.
It seems the difference in tweet lengths is statistically significant. 2020 tweets (those being retweeted) were slightly shorter.
The sentiments observed are not strikingly different, but there is more sadness, less surprise, more negative sentiments, and less joy being expressed in 2021—over a year from the start of the pandemic in the US.
Average AFINN scores for all retweeted words by date
30 Mar 2020: -0.525
30 Mar 2021: -0.608
The 2021 retweets appear to be more negatively valenced.